Efficient Pairwise Document Similarity Computation in Big Datasets

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Document Indexing-Based Similarity Search in Large Datasets

In this paper, we principally devote our effort to proposing a novel MapReduce-based approach for efficient similarity search in big data. Specifically, we address the drawbacks of using inverted index in similarity search with MapReduce and then propose a simple yet efficient redundancy-free MapReduce scheme, which not only takes advantages over the baseline inverted index-based procedures but...

متن کامل

Investigating Measures for Pairwise Document Similarity

The need for a more effective similarity measure is growing as a result of the astonishing amount of information being placed online. Most existing similarity measures are defined by empirically derived formulas and cannot easily be extended to new applications. We present a pairwise document similarity measure based on Information Theory, and present corpus dependent and independent applicatio...

متن کامل

Pairwise Document Similarity in Large Collections with MapReduce

This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to decompose the inner products involved in computing document similarity into separate multiplication and summation stages in a way that is well matched to efficient disk access patterns across several machines. On a colle...

متن کامل

Efficient Graph-Based Document Similarity

Assessing the relatedness of documents is at the core of many applications such as document retrieval and recommendation. Most similarity approaches operate on word-distribution-based document representations fast to compute, but problematic when documents differ in language, vocabulary or type, and neglecting the rich relational knowledge available in Knowledge Graphs. In contrast, graph-based...

متن کامل

Efficient structural similarity computation between XML documents

This work is mainly motivated by the description of a new approach for calculating the structural similarity of XML documents. Practically, the majority of existing work on XML documents clustering considers the tree structures of these documents as mere vectors and, therefore, does not take into account their hierarchical contexts. Furthermore, in order to calculate the structural similarity o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Database Theory and Application

سال: 2015

ISSN: 2005-4270,2005-4270

DOI: 10.14257/ijdta.2015.8.4.07